4.6 Set cover problem

Problem:
Suppose we are going to form a team to accomplish a mission. Team members are selected from a group of candidates each of whom specializes in several skills (such as reconnaissance, combat, medical support, etc). The objective is to make the team as small as possible while satisfying that every required skill is owned by at least one team member. And this is one situation that needs an algorithm for set cover problem.
Formally, the (un-weighted) set cover problem is addressed as following [3]. We are given a set U of m elements (called the universe and assumed to be {1,2,…,m}) and n subsets S_1,S_2,…S_n where each S_i ⊆ U. Given that the union of all n subsets equals U, the goal is to find out a minimum number of subsets so that the union of these subsets still covers U.

Solution:

The problem is known to be NP-hard. Other than looking for the optimal solution with exponential time, the following will introduce a \ln n-approximate greedy algorithm with polynomial time complexity. The strategy is that, at each step, one always selects the subset which contains the largest number of uncovered elements.

Algorithm

Input: universe U = {1 , 2 , … , m }, subsets S = {S_1,S_2,…,S_n}
Output: set C of subsets whose union covers {1 , 2 , … , m}
C = Ø
While U ≠ Ø
                X = S_i from S such that |S_i ∩ U| is maximized
                Remove X from S
                C = C ∪ { X }
                U = U \ X
Return C

Correctness:
The following shows that the algorithm is \ln n-approximate with the proof approach from [1]. Another analysis could be found in [2], which gives a better (\lnB+1) approximate ratio where B is the size of the largest subset.

Theorem 1: The greedy algorithm for set cover problem is \ln n-approximate.
Proof. Let k be the number of subsets selected by the optimal solution. Let U_t be the set of uncovered elements after the greedy algorithm has done t iterations. Define n_t = |U_t|. Since the k subsets in optimal solution must cover U_t, there exists a subset B containing at least n_t/k elements of U_t. Note that the greedy algorithm has not yet selected B (otherwise the uncovered elements will not be U_t), so at (t+1)-th iteration the greedy algorithm will be able to select a subset with at least n_t/k uncovered elements. Then the number of uncovered elements after (t+1) iterations becomes n_{t+1} ≤ n_t – n_t/k = n_t(1-1/k). Solving this inequality yields n_t ≤ n_0(1-1/k)^t = n (1-1/k)^t < n e^{-t/k} (the last inequality comes from: 1-x≤e^{-x}  for all x, and equality holds only when x=0). Then after T = k\ln n iterations, n_T will be less than 1, which means n_T=0 and there is no element left. Thus the approximation ratio is T/ k = ln ( n ).

References
[1] Dasgupta, S., Papadimitriou, C. H., and Vazirani, U. V. Algorithms. McGraw-Hill Higher Education, 2006.
[2] V. Chvatal. A greedy heuristic for the set-covering problem. Mathematics of Operations Research, 4(3):pp. 233-235, 1979.
[3] http://en.wikipedia.org/wiki/Set_cover_problem

© The University of Hong Kong Algorithms Library - hkual@cs.hku.hk